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We give a general formula for the number of occurrences of a pattern, or set of patterns, in 
the class of ordered (plane-planted) trees with a given number of edges. The proof is com- 
binatorial. Many known enumerations of ordered and binary trees are special cases of this 
formula. 


1. Introduction 


An ordered (or ‘‘plane-planted’’) tree is a tree in which the order of the outgoing 
edges of each node is significant. The degree of a node is the number of outgoing 
edges it has. By 7, we denote the class of ordered trees with n edges; the number 
of trees in 7,, is the well-known Catalan number 


E ERNE. (**)- 4" 
noin gett Ne ~ (n+ yan 


We draw trees with the root at the top and with outgoing edges pointing downwards. 
For example, the five trees in 7; are 


ADAS 


A “‘pattern’’ is like an ordered tree except that it also contains ‘‘open’’ and ‘‘clos- 
ed’’ slots. For example, the pattern 
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occurs wherever a node has a grandchild through its youngest child. The slots in the 
pattern are depicted as triangles and match any subtree, including the trivial (single- 
node) tree. An open slot is depicted as an unshaded triangle hanging off an edge 
(where a node would otherwise be); a closed slot, as a shaded triangle hanging off 
a node (like an edge). Slots may not be adjacent in a pattern. 

We are interested in enumerating occurrences of patterns in classes of ordered 
trees. For example, the above pattern occurs five times in the above class 7;, twice 
in the first tree (once at the root and once at its only child), twice in the second (once 
for each of the root’s two grandchildren), and once in the fourth. Formally, we have 
four cases: 


(1) The pattern @ occurs at any leaf (that is, end vertex of degree 0). 

(2) An open slot A or closed slot A occurs at any nonempty subtree (of at least 
one node). 

(3) If a pattern p occurs at a tree t, 


then the pattern 1 occurs at the tree i ; 


(4) If p occurs at ¢ and p’ occurs at ¢’, and p || p’ is a legal pattern (i.e. has no 
adjacent slots), then p | p’ occurs at ¢ | £. 


The composite pattern p || p’ is obtained by merging the roots of two patterns, 
with p to the left and p’ to the right; similarly, ź || t’ is the result of merging the roots 
of two trees ¢ and ¢t’. Thus, p || p’ appears at t || ¢’, if the latter can be decomposed 
into two subtrees, with p occurring at the left part ¢ and p’ occurring at the right 
part t’. 

Each occurrence of a pattern p in a tree ¢ defines a one-to-one correspondence 
from the nodes in p (including nodes at the top of closed slots) into the nodes of 
t (cases (1) and (3)) and from edges in p (including those from which an open slot 
hangs) into those in ¢ (case (3)), which preserves the edge-incidence relation. The 
number of occurrences of p in ¢ is the number of distinct correspondences. For ex- 
ample, the pattern 


occurs four times in the tree 
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once for each grandparent-grandchild relationship (three times at the root and once 
at its oldest child). 

Closed slots act like a variable number (including zero) of open slots. The 
distinction between open and closed slots becomes important when considering 
occurrences of more than one pattern. To denote a multiset of patterns, we write 
{ni *P1, Nz *DP>, ...,Ny*P,}, Where n; is the number of instances of the pattern p; in 
the multiset. A multiset of patterns occurs in a tree if each of its individual patterns 
occurs and the nodes of their occurrences are disjoint. An occurrence of such a 
multiset of patterns in a tree ¢ defines a one-to-many correspondence from each 
node in a pattern p; to n; nodes of t and from each edge in p; to n; edges in t, such 
that the incidence relation is preserved. The number of occurrences is the number 
of such correspondences. Note that one pattern may occur at the same subterm as 
that matched by an open slot of another, but that a tree node corresponding io the 
root of a closed slot cannot match any other node in the patterns (since that would 
make one node of the tree correspond to more than one pattern node). For example, 
the multiset of two of the above grandparent-grandchild patterns occurs six times 


among the four trees 


once in the first tree, twice in the second, and three times in the third. It does not 
occur in the fourth tree at all, since any two such relations share the grandparent 
node. 


2. Enumeration formula 
Our main result is the following: 
Theorem 2.1. The total number of occurrences of a multiset 


{721 * Pi Mz * Pr, <+., Me * Pk} 


of patterns among all ordered trees with n edges is 


1 ( n—e+d+1 je) 
n—etd+1 \n+1—0,14,My...,My n-e : 
where e is the total number of edges in the patterns, v is the total number of nodes 


in the patterns, c is the total number of closed slots, and d is the total number of 
open slots. 
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The second factor is the multinomial coefficient 
(n—e+d+1)! 
(n+1—v)!-nj!- nlm! 


and is taken to be 0 when n+1<7v. Note that v+d=e+n,+ +++ +My. 
For example, the number of occurrences of the multiset 


of patterns (three leaves, two nodes of degree at least two, and one leaf at level two 
or below) in the class 7g of 1430 ordered trees with eight edges (n=8, k=3, n,=3, 
n =2, n,=1, e=6, v=8, c=6, and d=4) is 


7 7 4 = 1680 
7 TAR ` 


Proof of Theorem 2.1. The patterns leave n+1—v of the nodes in a tree 
unrestricted, for each of which a single closed slot pattern A is added. The £ n;+ 
n+1—v=n-—e+d+1 patterns can be arranged in sequence in 


( n—e+d+1 

n+ l— v0, A, nz -s Ak 

ways. The c+n + 1 -—v closed slots now present can each be replaced with an open 
slot pattern of the form 


for some i, i20, in 
E 
n—e T n-e 


ways, such that among themselves the new patterns contain the n—e edges un- 
accounted for in the given patterns. 
Each of the 


( n-e+d+1 ears 

n+1—v,Ny, ny... Ak n-e 

such sequences is placed on a cycle; each such cycle of patterns can be put together 
in a unique way to form a tree. To see this, we adapt the Cycle Lemma in [9] (see 


Patterns in trees 245 


{7}). There are n—e+d open slots among the n~e+d+ 1 patterns. If n—e+d>0, 
then there must be at least one slotless pattern p that is followed by a pattern g with 
slots. Removing p from the cycle and grafting it into the leftmost slot of q leaves 
a cycle with one less pattern and one less slot. Continuing to graft in this manner 
until no slots remain, one ends up with a single (slotless) tree. Since the order of 
grafting does not change the outcome, and filling one slot does not affect the filling 
of another, the resultant tree is unique. 

Thus, of every n—e+d+ 1 sequences of patterns that give the same cycle, exactly 
one can be coalesced, by grafting adjacent patterns, to yield a tree. Our formula 
follows. Since each distribution of edges into the closed slots corresponds to a dif- 
ferent occurrence, the formula enumerates occurrences, not trees. Since only open 
slots are filled, the patterns do not share nodes. LU 


Alternatively, this theorem can be proved using generating functions [30], with 
Lagrange inversion playing a role analogous to that of the Cycle Lemma (see, e.g., 
[11]. 

The above formula counts trees whenever there can be at most one occurrence of 
the multiset of patterns in a tree. It generalizes the following known enumerations: 


- (Harary, Prins, and Tutte [15]) the Catalan number 


1 /2n 
n+i s) 


for unrestricted ordered trees with n edges (k=0, e=v=c=d=0); 
- (Cayley [3]) the Catalan number 


1 ti 
2r+1 r 
for unrestricted binary (degree-2) trees with r binary nodes and 2r leaves (n =2r, 


k=2, n=r, m=2r, e=d=2r, v=3r, c=0); 
- (Tutte [31]) the multinomial formula 


1 ( n+1 ) 
N+1 \ No, Nis.. My 


for enumerating trees with n; nodes of degree i and a total of n edges (k=n, 
e=d=n, v=n+1, c=0); 
- (Flajolet and Steyaert [10]) the binomial formula 


2n-—2e+d-1 ) 
n-e 
for occurrences of a single pattern with no closed slots (and no leaves), containing 


d open slots and e edges, among all ordered trees with n edges (K=1, n,;=1, v= 
e—d+1, c=0) and the binomial formula 


246 N. Dershowitz, S. Zaks 


2r—-u+1 
r+1 
for occurrences of a single pattern, containing u binary nodes and no leaves, among 


all binary trees with r binary nodes (n=e=2r, k=2, n =1, ny=r—u, v=r, c=0, 
d=2r—u+1). 


3. Applications 


The pattern enumeration formula of the previous section has wide applicability. 
We give here some corollaries and representative illustrations, using the following 
notations: T, for the class of ordered trees with n edges; B; for the subclass of T, 
in which all r internal (nonleaf) nodes are of degree t; B,=B? is the class of binary 
trees. In the following, all trees in 7,,, B!, or B, are assumed equiprobable. We let 
A; denote the pattern for a node of degree i: 


and A.; the pattern: 


—_— 


See [12] for a survey of tree enumerations. We make free use of identities in 
[18, 23] for evaluating combinatorial expressions in what follows. 


3.1. Tree enumeration 


When there can be at most one occurrence of a multiset of patterns per tree, our 
formula counts trees. This is the case, in particular, when the patterns cover all the 
nodes in the tree (i.e. v= + 1), there is at most one closed slot at any pattern node, 
and no two different patterns can occur at the same node (i.e. they do not ‘‘overlap” 
each other). Then, we have 


ml m Ve 
m \ Nys.. Mg n-e ) 


trees in 7, composed of m patterns {n,*p;,...,m,*D,}, m= Ð n;, containing a 
total of s slots of either kind (s=c+d) and e edges. (If there are no closed slots, 
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the last factor is 0 or 1.) More generally, inclusion/exclusion arguments can often 
be used to enumerate trees. 


3.1.1. According to Theorem 2.1, the total number of occurrences of the patterns 
{i+A.,} in T, is 


1 n+1\/2n—-ib 
mili) 


(n,=v=c=i, e=d=ib). Thus, the number of trees in 7,,, all of whose nodes have 
degree less than b, is given by inclusion/exclusion: 


1 ; me qe 
n+1 izi i ) n 
(cf. [17]). For b=3, this gives the number of ‘‘unary-binary’’ trees containing only 
leaves, unary (degree-1) nodes, and binary (degree-2) nodes: 


1 f(nti\/2n-3i\ "2i n 
Er, CS Ge 
n+l j=) i n k=0 \ 2k 
These are the same as the numbers for polygon partitions appearing in [20] (see [8]). 


3.1.2. The number of trees in 7, with exactly / leaves is equal to the number of 
occurrences of {/#Ap,(n+1—/)+#A,,}. Since the patterns cover all the nodes, we 
can let n=], n»=e=n+1—1, m=n+1, and s=2e in the above formula, and 


obtain 
1 n+1i\/n-1 
era I )( I~! ). 
This enumeration appears in [21] in the context of ballots, in [23] in reference to 
a communication problem, and in [5,24] for trees. 


3.1.3. The number of ‘‘reduced’’ ordered trees in T, with / leaves, having no unary 
nodes, is equal to the number of occurrences of {1+4p, (n+ 1 —1)*+4>2}. By letting 
n=l, ny=n+1—-1, e=2(n+1—-J), and s=3(n+ 1—1), one obtains 


1 n+1\/l-—2 
mal l Wa 


Summing this for all possible n, one gets a total of 


12-27 n 1-2 1/22 /k+1\/1-2 1/1 /1+k-1 
Ta eaaa ra) a ae Jt 
i nzi \i-1 n-li l kzo\ł-1 k 2 k20 2k 

l-leaf reduced ordered trees. Thesc numbers were investigated by Schröder [29]; 


their relation to trees appears in [18, p. 587]. For given n, the total number of 
reduced trees is 
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i ş C 2J- 1 So Gay arene 
n+] f= pn 41 l n—-l n+l kzı k k-1 : 
Alternatively, one can count each occurrence of m unary nodes 4; (0<sm<n) ina 
tree (¢=s=d=n,=m) and use inclusion/exclusion. That gives 


m(N+1\(2n-2m\_ & ng ft 
= G H o )( n-m )= 5," 2 (7 )e 


reduced trees with n edges, which is equal to the previous expression. These 
enumerations are also related to the Motzkin numbers (see [8, 25]). 


3.1.4. The total number of trees in B} is [17] equal to the number of occurrences 
of {r«A,,(t-1)r#Ao} in Tp: 


1 oes D 
wai ( r eel 


(letting n=e=¢tr, ny=r, n.=(t-—1)r, and s=d=tr). Grunert [14] gives the 
analogous result for polygons. 


3.2. Single pattern 


The enumeration formula is substantially simpler when there is exactly one pat- 
tern p. Setting k and n; to 1, we get 


1 eee nee = 2n—2e+s—1 
n-e+d+1\ n+1-9,1 ) n-e )-( n-e ) 
for the number of occurrences of e-edge pattern containing s slots (of any kind) in 
T, (cf. [10]). 


3.2.1. The expected number of nodes of degree d in a tree in T, is 
2n-d-1 ) 
( n—-1| 
1 =) ` 
n+1 ( n 
(Let p=Ag, e=s=d.) This enumeration appears in [5]. Considering the degree- 
zero case, the expected number of leaves (or internal nodes, for that matter) is 


4(n+ 1). The latter result appears also in [4]. 


3.2.2, The expected distance between nodes in a tree in 7, can be found using the 
pattern 
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(s=2e+ 1). There are G2) occurrences of such a pattern and e such patterns for 
given distance e; hence, the expected distance is [26] 


2n 
2 
a _ et 


(aE GC) 


3.3. Root pattern 


n 
=ia( )= vmn. 


A pattern p can be constrained to appear at the root of a tree by enumerating its 
occurrences and then subtracting the number of occurrences of the pattern 


for nonroot occurrences. Using the above formula for a single pattern, we get 
Gan [eae we Ss goa 
n—e n-e-1 ~ 2n-2e+s n-e 


_ sS (inte 
~ n-e n—e-l 


for the number of occurrences of an s-slot, e-edge rooted pattern in 7,,. This for- 
mula counts trees whenever the pattern can occur only once at the root, i.e. when 
it has at most one closed slot at each of its nodes. 


3.3.1. The number of ordered trees in T, with root degree r is the number of root 
occurrences of 4, (e=r), viz. 


7) 
n\ n-1 J 


It follows that the expected root degree of a tree in T, is 
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2n—r-1 
2 
a | n-1 ) 3n 


nC, n+2- 


This enumeration appears in [33]; alternative proofs are given in [6,28] (see also 
[1]). Higher moments can also be calculated; for example, the variance of the root 


degree is 
2n-r-1 
3 
raat n-i ) al 


nC, n+2 
of 2n eee, 
eee n+3 n+3 -| 3n ij 
as n a 
n+1 ( n 


3.3.2. Using the root pattern 


(e=1, s=2/1+ 1), we can determine the expected number of nodes on level / of a tree 


in T,: 
21+ 1 /2n+1 
al oa) 
Cn 7 


Thus, the expected level of a node in T, is 


21+1 /2n+1 
2 lz +1 ( di l ) n 
E =+a( l )-$= amt. 
n 7 
a 
Generating function derivations of this result have been given by Volosin [32], Meir 
and Moon [19], and Dasarathy and Yang [4]. 


3.3.3. With a similar pattern, the expected number of leaves on level / of a tree in 
T, is found to be 


Patterns in trees 251 


(24) (2) 
n\ nel n-li 


Cp 2n ) ` 
Oe 
It also follows that the expected level of a leaf in 7, is 
n 
i( ' ) =4/an. 
H 


The ‘‘external path length’’ of a tree (as defined in [18]) is the sum of the levels of 
its leaves. Thus, the expected external path length is approximately 4(n + 1)ynn. In 
a similar manner, the expected level of an internal node and expected ‘‘internal path 
length’? (sum of the levels of its nonleaf nodes) can be computed (see [6]). 


3.3.4. The pattern 


R 


S 


(e=l+d, s=2/+d) counts the total number of nodes in T, of degree d on level /: 


+d PR 
2n—d\ n+l j` 


This result was first proved by Dershowitz and Zaks [5]; alternative proofs were 
given by Dershowitz and Zaks [6], Ruskey [26], and Kemp [16]. Thus, the expected 
degree of a node on level / of a tree in T, is [5] 


£ naa (m) : 
a 2n~d\ ntl / 2l+3 n-li zi 2 


= =1+ . 
(*") 2l+1n+14+2 2i+1 


n 


3.4. Fixed-arity trees 


Frequently, one wishes to enumerate patterns in the subclass By of t-ary trees in 
T,,. This can be done by adding an appropriate number of patterns Ay and 4,, con- 
straining all nonpattern nodes to be of degree zero or ¢. In particular, for a single 
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t-ary pattern p, setting n =e = ftr, and letting u and w denote the number of internal 
nodes and open slots in p (respectively), one obtains the formula 


1 ( t(r—u)+wtl GOB” 
t(r-w+wt+l\ir-u(t-Ir-w+w/\ 0/7 \ r-u 
for the number of occurrences of p in B}. Note that, in the fixed-arity case, there 
is no point in using closed slots (cf. [10]). 


3.4.1. The number of left children having right leaves among the binary trees in B, 
is equal to the number of occurrences of the pattern 


By letting t=2 and u=w=2 in the formula, one obtains 
eS) 2 ee. 
r-2 r 
This formula is derived in [2] using generating functions (see also [13]). 


3.5. Rooted fixed-arity patterns 


The same technique as for ordered trees may be used to restrict a pattern in a f-ary 
tree to appear at the root. This gives us the foliowing enumeration of f-ary trees con- 
taining a pattern with u internal nodes and w open slots: 


ee ek eee) (oe) 


r-u r-u-1 r-—u-1 


r-u 


3.5.1. The number of leaves in Bi on level / is ¢’ times the number of occurrences 
of the rooted pattern 


(u=1, w=tl—1). Thus, the expected external path length of a tree in Bf is 


UP /tr-l-} 
eD Firat) 


~l\ r-l-1 


te 
r a 
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In particular, for binary trees (¢=2), this simplifies to 
4 +2r 


1 (7) 
r+i\r 


(see [18, Section 2.3.4.5]). 


—3r+1=(r+1)/nr—-3r+1 


3.5.2. The number of ordered forests containing w t-ary trees with a total of n inter- 
nal nodes (of degree t) may be determined by counting the number of occurrences 
of the rooted pattern 


in Bi,.,,. It is 
w ‘one 
in+w n J 


4. Conclusion 


We have presented general-purpose formulae for the enumeration of occurrences 
of patterns in ordered, binary, and t-ary trees, and demonstrated their flexibility. 
It is perhaps instructive to note some of the counting problems for which the for- 
mulae are not well-suited. We cannot, for example, express extremum conditions, 
such as ‘‘lowest’’ node. Nor can we, in general, demand symmetry, i.e. that dif- 
ferent slots be filled by identical subtrees. 
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